Introduction

This project is an extension of the analysis of my facebook ego network as worked on in class. In addition to the in-depth analysis of the personal network, I also carried out an analysis between a few anomalyzed ego networks.

Obtaining Data

Documentation

My personal facebook ego network data is obtained through the Getnet app with instructions posted in assignment 1 of the SNA class. The data is downloaded in .gml format.

The other anomalyzed data sets were obtained from Stanford’s collection of social network data. There are 10 anomalyzed ego network data in total. We make use of these by comparing the properties with my personal network (McAuley and Leskovec 2012).

Explanation of Criteria for Including Nodes and Edges

The nodes represent the friends from the corresponding individual’s facebook ego network data. The edges represent the mutual friend relationships. I believe there may have been some munging/filtering of data in the stanford ego networks.

Subjective interestingness/originality of the subject of data collection

In addition to the raw data from my facebook ego network, I’ve manually inserted “test” labels of how I met each individual friend in my network. Part of the goal of this project is to see whether different community detection algorithms could pick up the correct clusters corresponding to how I’ve met them in person. For example, from playing Ultimate Frisbee, high school, UBC stats department, etc.

In addition, the idea of including additional ego network data came from Dr. Lada Adamic’s reply in this thread. Therefore, the ensemble of ego network is usually of interest also.

Data Analysis

Stats on Personal Network

The following step-by-step code makes use of some helper functions created to generate statistics. It is not included here to minimize visual clutter.

Load in the following packages and helper functions

library(igraph)
library(RColorBrewer)
library(plyr)
library(dplyr)
library(xtable)
library(gridExtra)

source("count.degree.distribution.R")
source("computeStats.R")

Read in the Graph

fbNetworkFile = "facebook_with_relation.gml"
G = read.graph(file=fbNetworkFile, format="gml")

The “test labels” I mentioned earlier are called relations, that is, how I first met a particular friend. Assign a colour to each relation.

pairedColors = c(brewer.pal(n=12, name="Paired"))
names(pairedColors) = c("ultimate-rec", "ultimate-competitive", "ubc-cpsc", 
                        "ubc-stat", "ubc-event", "closest-friends", "soccer", 
                        "hockey", "toys-r-us", "environment-canada", 
                        "high-school", "relative")
V(G)$color = revalue(V(G)$relation, pairedColors)

Graph Statistics

Compute some graph statistics on personal network

outGraphStats = computeGraphStats(G)

Average shortest path: This is the average number of steps along the shortest paths for all possible pairs of network nodes.

Cluster coefficient: Degree to which nodes in a graph tend to cluster together.

Local Cluster coefficient: The fraction pairs of neighbours of the node that are themselves connected. This is the local cluster coefficient for myself.

  • Average shortest path: 2.3542873
  • Cluster coefficient: 0.5481617
  • Local cluster coefficient: 0.1321006

Let’s compare this with a erdos-renyi random graph to see how it compares with my ego network

First simulate an erdos-renyi random graph is same number of nodes and edges

erdo = erdos.renyi.game(length(V(G)), p.or.m=length(E(G)), type="gnm")

Compute some graph statistics on erdos-renyi graph

gsErdo = computeGraphStats(erdo)
  • Average shortest path: 1.8768043
  • Cluster coefficient: 0.1331016
  • Local cluster coefficient: 0.1321006

Interpretation Both the average shortest path and cluster coefficient for the ego network is both higher than the random graph which is interesting. The ego network has several dense clusters, and many cliques within clusters as we shall see later. Therefore, the cluster coefficient is high. Since, there are nodes with very few links may have contributed to the higher average shortest path.

Individual Statistics

Here, we compute individual statistics about the nodes in the graphs. I’ll only only be computing these for a few people from every relation (or “test” labels).

Degree: The number of mutual friends.

Betweeness: The number of shortest paths from all the nodes to all others that pass through that node.

Closeness: The length of the average shortest path between a node and all other nodes in the network.

Pick some names to analyze

nameList = c("Yuji Aizawa", "Jasper Lu", "Rhona Yue", "Kevin Underhill",
          "Esther Fann", "Sean Montgomery", "Tyki Sueyoshi", "Louisa Lau",
          "Ellery Lee", "Jonathan Baik", "Alex Tan", "Andrew Brear",
          "Angela S", "Simon Tai")

Compute individual statistics

nodeStats = computeNodeStats(G)
nodeStats = cbind("name"=V(G)$label, "relation"=V(G)$relation, nodeStats)
nodeStats = data.frame(nodeStats)
nodeStats = nodeStats %>% filter(name %in% nameList) %>% 
              arrange(desc(betweenness))
print(xtable(nodeStats), comment=F, type="html", include.rownames=F)
name relation degree betweenness closeness
Yuji Aizawa closest-friends 107.00 4702.97 0.00
Jasper Lu ultimate-rec 88.00 1040.79 0.00
Ellery Lee closest-friends 27.00 508.51 0.00
Rhona Yue ubc-event 20.00 405.73 0.00
Esther Fann environment-canada 13.00 389.40 0.00
Tyki Sueyoshi ubc-stat 8.00 240.74 0.00
Simon Tai ubc-stat 37.00 204.92 0.00
Sean Montgomery ultimate-competitive 67.00 150.26 0.00
Jonathan Baik environment-canada 9.00 101.45 0.00
Kevin Underhill ultimate-competitive 56.00 63.87 0.00
Andrew Brear ultimate-competitive 28.00 16.73 0.00
Alex Tan ubc-cpsc 4.00 0.05 0.00
Angela S toys-r-us 2.00 0.00 0.00
Louisa Lau relative 2.00 0.00 0.00


Interpretation

The table is ordered by highest betweenness. Yuji Aizawa is one of my closest friends. It appears that he has both the highest number of mutual friends and betweenness. This table also shows that a high number of degree does not imply that betweenness is high. Moreover, my closer friends have generally high betweeness.

Visualizations

First filter out the labels to show only the names of interest

V(G)$label.cex = 1
labelsG = V(G)$label
labelsG[!(labelsG %in% nameList)] = NA

**The layout below is a great default layout for large graphs. Here is where I found it. In addition, the betweenness property is encoded by node size. Whereas, the relation is encoded with colour.

opar <- par()$mar; par(mar=rep(0, 4))
layout <- layout.fruchterman.reingold(G, niter=500, area=vcount(G)^2.3, 
            repulserad=vcount(G)^2.8)
myPlot = plot(G, layout=layout, vertex.size=log(betweenness(G) + 1), 
          vertex.label=labelsG, vertex.label.color="black")
legendLabels = unique(V(G)$relation)
legendColours = unique(V(G)$color)
legend("topleft", legend=legendLabels, col=legendColours, pch=19, 
  bty="n", cex=.8)


Interpretation

The algorithm layout (force-directed type?) algorithm did a fairly good job at placing the nodes on the graph in respect to the true relation labels. That is to say, we can clearly discriminate the different groups. Of course, there may have been a small bias while manually labelling the points. However, most relation reflect my first encounter with the person, so the bias is minimized in that respect. What stood out to me was, Simon Tai appears to be clustered with my friends whom play ultimate frisbee. However, I know him because of my relation with him within the stats department at ubc. It turns out he plays ultimate frisbee recreationally and has a lot of friends in common that also plays ultimate.


Now let’s take a look at what the erdos-renyi random graph look like.

labelsG = NA
erdo = erdos.renyi.game(length(V(G)), p.or.m=length(E(G)), type="gnm")
erdoPlot = plot(erdo, layout=layout, vertex.size=log(betweenness(erdo)+1), 
              vertex.label=labelsG)
title("Erdos-Renyi Random Graph")

Community Detection

We’ll try 2 different community detection algorithms: modularity and walk-trap. Modularity algorithm considers edges that fall within a community or between a community and the rest of the network. Walk-trap algorithm find communities through random walks. The idea behind walk-trap is that short random walks tend to stay in the same community. (This will count towards having tried out a method not introduced in class)

Execute community finding algorithms for personal network and erdos-renyi graphs

mc = fastgreedy.community(G)
mcErdo = fastgreedy.community(erdo)

wtc = walktrap.community(G, steps=4)
wtcErdo = walktrap.community(erdo, steps=4)

Plotting details are omitted here, but the code can be found in plotCommunites.R

Modularity score community finding algorithm

My Ego Network

Erdos-Renyi

Walk-trap community finding algorithm

My Ego Network

Erdos-Renyi

Interpretation The erdos-renyi graph does not exhibit any interesting communities after performing the modularity and walk-trap algorithms. However, the ego network definitely has out some interesting communities. In particular the walk-trap algorithm detected the cluster people in the ego network that plays ultimate. Moreover, it grouped together my closest friends with the people who I met at a ubc event. It found the group of people I worked at toys-r-us. It trivially detected my relatives, since they have no connection with any other person in the graph.

Stats Including Stanford Anomalyzed Ego Networks

Graph Stats

# compute graph statistics similarly as for my ego network
getGraphStatsEgoS = function(egoStanfordFile){
  egoS = read.graph(egoStanfordFile, directed=F)
  egoS = simplify(egoS, remove.multiple=T, remove.loops=T)
  egoSstr = sub("\\.edges", "", basename(egoStanfordFile))
  out = data.frame("network"=paste0("stanford-ego_",egoSstr), 
          as.data.frame(computeGraphStats(egoS)))
  return(out)
}
# find stanford anomalyzed ego networks
egoStanfordFiles = list.files("facebook_stanford", "edges", full.names=T)
graphStatsDf = ldply(egoStanfordFiles, .fun=getGraphStatsEgoS)
# combine the graph stats information in a dataframe
temp = rbind(as.data.frame(c("network"="my-ego_network", outGraphStats)), 
        as.data.frame(c("network"="erdo_network", gsErdo)))
graphStatsDf = rbind(temp, graphStatsDf)
graphStatsDf = graphStatsDf %>% arrange(desc(transitivity))
network avgShortestPath transitivity localClusterCoefG
stanford-ego_1912 2.56 0.70 0.01
stanford-ego_698 1.93 0.66 0.00
stanford-ego_414 2.69 0.65 0.01
my-ego_network 2.35 0.55 0.13
stanford-ego_107 2.95 0.50 0.01
stanford-ego_348 2.52 0.49 0.02
stanford-ego_686 2.43 0.45 0.00
stanford-ego_1684 3.04 0.45 0.00
stanford-ego_3980 2.55 0.45 0.00
stanford-ego_3437 3.45 0.45 0.00
stanford-ego_0 3.75 0.43 0.04
erdo_network 1.88 0.13 0.13


Interpretation The table was arranged with respect to the transitivity, also known as clustering coefficient. It turns out that my ego network is third highest in terms of transivity. All the ego networks had much higher transitivity levels than the erdos-renyi random graph. Similary, the average shortest path is also all greater than the random graph. In terms of average shortest path and transitivity, my ego network fit right in with the stanford samples.

Community Comparisons and Visualizations

Community structure finding is carried out on the 8 sample ego network from the stanford database. We used both the modularity score and walk-trap algorithm. The figures are shown below.

Modularity score community finding algorithm

My Ego Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Erdo-Renyi



Walk-trap community finding algorithm

My Ego Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Sample Network

Erdo-Renyi


Interpretation

It appears that the number of communities detected varies between the sample ego networks. The second network from the top left has the most similar community structure as my ego network. However, there are a lot of communities amongst the other networks. The third network from the top appears to have a giant componenent and many small cliques suggesting that there is a large group of friends with many connections as well as many small groups of friends.

References

McAuley, J., and J. Leskovec. 2012. “Learning to Discover Social Circles in Ego Networks.” NIPS.